Pseudo Vector Processor for High-Speed List Vector Computation with Hiding Memory Access Latency
نویسندگان
چکیده
We present two scalar processors called PVP-SWPC and PVP-SWSW for high-speed list vector processing. Memory access latency should be tolerated for this objective. PVP-SWPC tolerates the latency by introducing slide-windowed floating-point registers and prefetch-tocache instruction. PVP-SWSW tolerates the latency by introducing slide-windowed general and floating-point registers. Owing to the slide-window structure, both processors can utilize more registers in keeping upward compatibility with existing scalar architecture. The evaluation shows that these processors successfully hide memory latency and realize fast list vector processing.
منابع مشابه
Evaluation of Pseudo Vector Processor Based on Slide-Windowed Registers
We present a new scalar processor for high-speed vector processing and its evaluation. The proposed processor can hide long main memory access latency by introducing slide-windowed floating-point registers with data preloading feature and pipelined memory. Owing to the slide-window structure, the proposed processor can utilize more floating-point registers in keeping upward compatibility with e...
متن کاملThe effect of adding a scalar D-cache to the Cray-4 vector processor
In the past, vector supercomputers achieved high performance with long arithmetic pipelines coupled with fast scalar processors. Processor speed has increased at a rate greater than memory speed. Indeed, current vector processors have cycle times far faster than the memories they are connected to. When compilers can predict memory access patterns, they vectorize computations and thereby hide th...
متن کاملCompiler Generated Multithreading to Alleviate Memory Latency
Since the era of vector and pipelined computing, the computational speed is limited by the memory access time. Faster caches and more cache levels are used to bridge the growing gap between the memory and processor speeds. With the advent of multithreaded processors, it becomes feasible to concurrently fetch data and compute in two cooperating threads. A technique is presented to generate these...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کامل